Unsupervised Grounding of Spatial Relations
نویسندگان
چکیده
We present an unsupervised connectionist model for grounding color, shape and spatial relations of two objects in 2D space. The model constitutes a two-layer architecture that integrates information from visual and auditory inputs. The images are presented as the visual inputs to an artificial retina and fiveword sentences describing them (e.g. “Red box above green circle”) serve as auditory inputs with phonological encoding. The visual scene is represented by the Self-Organizing Map(s) and the auditory description is processed by a recursive SOM (RecSOM) that learns to topographically represent sequences. Primary representations are integrated in a multimodal module (implemented by SOM or Neural Gas algorithms) in the second layer using self-organizing units with conjunctive representations. We tested this two-layer architecture in two versions (a single SOM representing color, shape and spatial relations vs. biologically inspired separate SOMs for spatial relations and for shape and color) and several conditions (scenes with varying complexity up to 3 colors, 5 object shapes and 4 spatial relations). In the scenes with higher complexity we reached better results with NG algorithm in the multimodal layer compared to SOM, which is thank to the flexible neighborhood relations in NG algorithm, relaxing topographic organization. The results confirm theoretical assumptions about the different nature of visual and auditory coding. Our model is hence able to efficiently integrate the two sources of information while reflecting their specific features.
منابع مشابه
Machine Symbol Grounding and Optimization
Autonomous systems gather high-dimensional sensorimotor data with their multimodal sensors. Symbol grounding is about whether these systems can, based on this data, construct symbols that serve as a vehicle for higher symbol-oriented cognitive processes. Machine learning and data mining techniques are geared towards finding structures and input-output relations in this data by providing appropr...
متن کاملConcept Grounding to Multiple Knowledge Bases via Indirect Supervision
We consider the problem of disambiguating concept mentions appearing in documents and grounding them in multiple knowledge bases, where each knowledge base addresses some aspects of the domain. This problem poses a few additional challenges beyond those addressed in the popular Wikification problem. Key among them is that most knowledge bases do not contain the rich textual and structural infor...
متن کاملGrounding Dynamic Spatial Relations for Embodied (Robot) Interaction
This paper presents a computational model of the processing of dynamic spatial relations occurring in an embodied robotic interaction setup. A complete system is introduced that allows autonomous robots to produce and interpret dynamic spatial phrases (in English) given an environment of moving objects. The model unites two separate research strands: computational cognitive semantics and on com...
متن کاملTemporal Grounding Graphs for Language Understanding with Accrued Visual-Linguistic Context
A robot’s ability to understand or ground natural language instructions is fundamentally tied to its knowledge about the surrounding world. We present an approach to grounding natural language utterances in the context of factual information gathered through natural-language interactions and past visual observations. A probabilistic model estimates, from a natural language utterance, the object...
متن کاملGrounding Relations in Action
This paper describes an attempt to ground the meaning of relational concepts in the sensory-motor dynamics resulting from our active interaction with the world. It is suggested that relations are encoded by executing or mentally simulating actions which are constrained by the environment and the specifics of our physical bodies. The virtues of the approach are demonstrated by a computational si...
متن کامل